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1. Introduction 


This paper explores how Yoruba gamers map lexical meaning to videogame music in their 
native language. Sound-meaning mapping is an obligatory aspect of spoken language 
(Hockett, 1960; de Saussure, 1974; Dingemanse et al., 2015) but an optional aspect of 
music. For example, it has been repeatedly shown that humans associate affective and non- 
affective meaning to various pitch and tempo patterns (Feld, 1984; Hacohen and Wagner, 
1997; Koelsch et al., 2004; Patel, 2008; Garcia et al., 2014; Neumeyer, 2015). Prominent 
examples of sound-meaning mapping outside spoken language are musical speech surro- 
gates, such as talking drums and flutes of West Africa, Asia and America (Stern, 1957; 
Carrington, 1971; Ekwueme, 1974; Bradley, 1979; Lo-Bamijoko, 1987; Poss, 2005; Win- 
ter, 2014; Poss, 2012; Seifart et al., 2018; McPherson, 2018). In this case, musicians map 
linguistic meaning to music melodies and rhythm by imitating linguistic features with mu- 
sical instruments. Consequently, musicians from these cultures are able to communicate in 
a language-like form via musical instruments. 

Studies of music communication in cultures with musical speech surrogates have 
tended to focus more on musical imitation of language, but it is also important to examine 
the “opposite”, which involves verbal imitation or interpretation of music melodies. Using 
linguistic instrumentation and methodology, the present study contributes to research on 
verbal interpretation of musical sounds in a culture with speech surrogates. The present 
work is based on conventionalised verbal imitations of the music in the Super Mario game 
by Yoruba gamers. 

I will argue that Yoruba gamers mapped lexical meaning to the videogame music by 
(i) verbally imitating the pitch sequences of the videogame music, (ii) using the scenes of 
musical performance in the videogame and (11) projecting the social expectation about mu- 
sic in Yoruba culture to the function of the videogame music. This work is of interest for 
three main reasons. First, the role of vocal imitation in mapping lexical meaning to music 
melodies suggests a parallel between language and music, considering that sound sym- 
bolism in human language (e.g., onomatopoeia) also involves resemblance-based mapping 
between language and language-external acoustic sources (Ramachandran and Hubbard, 
2001; Dingemanse et al., 2015; Akinbo, 2021a,c). Most of the evidence for the verbal im- 
itation in musical meaning is intra-cultural (Patel and Iversen, 2003; Villepastour, 2014; 
James, 2021), but the verbal interpretation of the videogame music by Yoruba gamers 


presents inter-cultural evidence for verbal imitation in musical meaning. The second area 
of interest is the role of context in mapping linguistic meaning to language-external sounds. 
Just as context contributes to meaning in natural language (Eberhard et al., 1995; Tanen- 
haus and Trueswell, 1995), there is evidence to suggest that context also plays a crucial 
role in mapping lexical meaning to music melody (Villepastour, 2014). However, the sim- 
ulation of context in experimental conditions poses a serious challenge for the study of 
musical cognition (Valsiner, 1994; Parrott and Hertel, 1999). The present study presents a 
real-life evidence for the role of context in musical meaning, considering that the scenes 
of musical performance in the videogame also determines the words that were mapped 
to the musical melody. Third, the results of this work are consistent with the claim that 
videogame music supports the perception of game world, player’s involvement and game 
narrative (Zehnder and Lipscomb, 2006; Grimshaw, 2008; Nacke et al., 2010; Sanders and 
Cairns, 2010). While research on videogame music mostly emphasises music structure 
and gameworld contexts (Munday, 2007; Laroche, 2012), the present study indicates that 
cultural background of gamers is also relevant. 

As a background to the discussion on the verbal interpretation, the discussion in §2 fo- 
cuses on aspects of language and music tradition that form basis of the verbal interpretation. 
For the present study, the conventionalised linguistic interpretations of some musical motifs 
in the videogame Super Mario Bros by Yoruba gamers were documented. The linguistic 
interpretations and the corresponding musical motifs are presented in §3. The discussion in 
§4 explores why the gamers interpreted the videogame musical motif into Yoruba. To com- 
pare the pitch contour of the linguistic interpretations and the corresponding videogame 
music, native speakers of Yoruba were recruited for a production experiment. The conven- 
tionalised interpretations were presented to the participants with and without the original 
videogame music. They reproduced the conventionalised arrangements in speech and sung 
mode. The results of their spoken and sung productions were acoustically compared with 
the original videogame music. The details of the methodology and results of the acoustic 
investigation are presented in §5. In §6, a formal analysis of the results are presented. The 
summary, discussion and the conclusion are presented in §7. 


2. Language Background: Yoruba and its Talking Drum 


Yoruba is a Volta-Congo language with more than 20 million speakers in West Africa and 
most prominently South Western Nigeria (Blench, 2019). Yoruba is a tone language, which 
means pitch contrast brings about lexical or grammatical meaning distinction (Yip, 2002; 
Hyman, 2018). As shown in (1), the language contrasts three tones, namely H(igh) L(ow) 
and M(id) (Bamgbose, 1967; Awébultyi, 1978; Akinlabi, 1985; Pulleyblank, 1986). The 
tone-bearing unit in Yoruba is a mora, which is projected by a vowel or a syllabic nasal. 


(1) Tonal minimal set in Yoruba 


a. H ra ‘disappear’ no ‘it’s going’ 
b. L ra ‘buy’ njé ‘is it?’ 
c. M ra ‘rub’ kinlo ‘I should go’ 


By transposing speech tones to tunes and syllables to drum strokes, Yoruba musicians are 
able to communicate via native and non-native music instruments (Euba, 1990; Waterman, 
2000; Villepastour, 2010, 2014; Durojaye et al., 2021b; Gonzalez and Oludare, 2022). Us- 
ing acoustic analysis, Akinbo (2019, 2021b) shows that there is a positive correlation be- 
tween the pitch contours of the speech tones and those of their corresponding musical 
rendition, as shown in Figure 1. 
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Figure 1: Mean FO contours for Yoruba drum and speech tones 


Figure 1 shows the drummed and spoken forms of the Yortiba phrase in (2) . In the 
graph, the y-axis contains the acoustic measurement of pitch contour in FO(Hz), and the 
X-axis contains the tones of the Yoruba phrase. The dark line is for the drum pitch and the 
grey line is for the speech pitch. As shown in the graph, the pitch contours of the Yoruba 
phrase and the corresponding drum rendition have similar trajectories. 


(2) dgtinbddé gba *gba 
O. take egg 
“Ogtinbodé collected a/the garden egg” (Laniran and Clements, 2003) 


Nketia (1963) identifies three modes of speech surrogacy, namely speech mode, sig- 
nal mode and dance mode. These modes comparable to three forms of drumming in Euba 
(1990), which are direct speech form, musical speech mode and song mode. The speech 
mode “involves direct reproduction of the pitches and rhythm of spoken language” (Agawu 
2016:128). Therefore, musicians are capable of musically rendering any verbal utterance 
in speech mode. The signal mode is similar to the speech mode in terms of matching the 
pitches and rhythm of spoken language, but the drum messages in this mode are mostly 
restricted to poetic phrase or epitaph. Unlike the speech and signal modes, the rhythm of 
the dance mode is musical, “metrically constrained, often affiliated with movement, but 


not necessarily of linguistic origins” (Agawu 2016:128). However, the rhythms of spoken 
language and speech surrogacy in Yoruba have not subjected to instrumental analysis. In 
the present study, I only focus on the pitch contours for the verbal imitation of music in 
Yoruba. 

Regardless of the mode of speech surrogacy, listeners have to decode the linguistic in- 
formation communicated via the instrument. The verbal interpretation of speech-surrogate 
messages has not been previously studied but is often alluded to. Previous studies suggests 
that consumers of speech surrogates rely on pitch contour and context in their interpretation 
of the messages encoded with speech surrogates (Villepastour, 2010; Sotunsa, 2021), but 
the possibility of simply interpreting speech-surrogate messages based on previous experi- 
ence and associated meaning cannot be ruled out. For example, the findings of Durojaye 
et al. (2021a) indicates that familiarity with speech surrogacy plays a role in distinguishing 
speech mode from song mode. Though Durojaye et al. (2021a) is about the categorisa- 
tion of Yoruba drum stimuli as music-like or speech-like, their finding is relevant to the 
interpretation of speech-surrogate messages. In fact, the role of familiarity or previous ex- 
perience in the interpretation of speech-surrogate codes is captured by the Yoruba maxim 
which says, Ord asoti 16 n jé omo mi gbéna “if your child understands your code language, 
it is because you both share the secret’ (Isola 1982:44). 

To control for the effect of familiarity, the present study is based the verbal inter- 
pration of foreign instrumental music which do not involve speech surrogacy. In the next 
section, I describe the musical source and the verbal interpretations. 


3. Verbalising the music motifs of Super Mario Bros in Yoruba. 


The data source for the present study is the linguistic interpretation of Super Mario music. 
In this section, I present background information about the game, the music motif of the 
game and the conventionalised verbal interpretations of the music motif by Yoruba gamers. 

Nintendo, a Japanese multinational company, released the console videogame called 
Super Mario Bros in 1985. More than 40 million copies of the game were sold worldwide 
(Stuart, 2010). The protagonist of the “the jump-and-run” game is Mario who is an Italian 
plumber. In the game which is set in the Mushroom Kingdom, the player takes on the role 
of Mario or his brother Luigi in a multiplayer mode. The objective of the game is for Mario 
(or Luigi) to save Princess Toadstool, but for Mario to save the princess, he needs to survive 
the main antagonist Bowser, the forces of Bowser, the dangerous terrains in the game and 
the allotted game time (Nintendo, 1985; Loguidice and Barton, 2012). 

The pianist, Koji Kondo, composed the music motif of the game. According to 
Laroche (2012), the composition of the music motif is a musical reaction to “players’ in- 
tended impressions”. Following Collins (2009: 6), the sounds in Super Mario Bros can be 
classified into interactive and adaptive audios. Interactive audios are “sound events directly 
triggered by the player’s input device”. The sounds which accompany jumping, hitting an 
object, shooting, etc. are examples of interactive audios (see Lerner, 2014). On the other 
hand, adaptive audios are not controlled by the player. Adaptive audio events are cued by 


the game scenes, locations, game time, the presence of non-player characters, etc. There 
are at least eighteen sounds in the game, but the focus of the present paper is four adap- 
tive audio events, namely Overworld/main motif, Flagpole Fanfare, Underworld motif and 
Death motif. Because the Overworld motif is played throughout the game with some tempo 
variations in certain stages, it is also called the main motif (see Lerner, 2014; Schartmann, 
2015). The musical experience in videogames are the major innovations of Super Mario 
Bros (Collins, 2009; Lerner, 2014). 

The game was very popular in Nigeria around 1990s and early 2000s, but it is un- 
certain when the game got to the country. An aspect of gaming culture among Yoruba 
gamers in Nigeria was the verbal interpretation of excerpts from the Super Mario instru- 
mentals into Yoruba (Ayoola, 2019). The verbal interpretations developed naturally in the 
gaming community. Among all the instrumentals in the game, only the adaptive sounds 
were conventionalised by the gamers. This paper only focuses on the conventionalised in- 
terpretations. To my knowledge, the vocal interpretations of Super Mario music by Yoruba 
gamers have not been studied until now. The available sources on the interpretations are on- 
line posts, which only document fragments of the phenomenon (see the tweets of Akintola 
2011 and Odesanya 2013). The present work is based on my documentation of the verbal 
interpretations, as practiced in Ogun and Lagos, Nigeria. Nintendo did not release the score 
of the music in any of their videogame games, including Super Mario Bros. However, au- 
thentic scores, such as the one consulted for thus work, exist.! The music score excerpts 
containing the interpreted motifs are presented in this section. Each syllable of the verbal 
interpretations is aligned with the corresponding note of the interpreted motifs. 
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Té bala yé kowa gbo béldérun 
"Make eba on earth, come to heaven for soup" 


Figure 2: The interpretation of Overworld motif by Yoruba gamers 


The excerpt of the Overworld motif and the interpretation are presented in Figure 2. 
The gamers only interpreted an excerpt of the Overworld motif, specifically the second part 
of the first motif played at the very beginning of the first stage (i.e., World | level 1). The 
first instance of the excerpt plays from 389 to 383 game time, but its repetition plays in 
other game times. The linguistic interpretation of the excerpt from Overworld motif is an 
adaption of a Yoruba idiom which is used as a death threat. The only difference is that 
the phrase /gtin ’yan/ ‘pound iyan’ in the idiom is replaced with the phrase /te “ba/ ‘make 


'The full score of the excerpts used in this work can be found on https: //www.ninsheetmusic.org/ 
browse/series/SuperMario. 


&ba (lit: mash &ba’) in the interpretation of the Overworld motif”. With the exception of 
the Overworld motif, the interpretations of the other music motif are original, not based on 
previously existing Yoruba proverbs. 

When Mario drops from the flag after winning the first stage of the game and before 
entering the castle, the Flagpole fanfare is played. The entire Flagpole fanfare consists 
three iterations of a motif and a closing chord gesture. Each iterations of the motif has the 
same verbal interpretation, so only one of the iteration is presented in Figure 3. While some 
gamers interpret the closing gestures, other did not. As a result of this, the closing gestures 
are not included in this work. 
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"(S)he entered and cannot come out" "(S)he entered and cannot come out" 


Figure 3: The interpretation of Flagpole fanfare motif by Yoruba gamers 


The second stage of Super Mario Bros is set in the underworld, specifically World 1 
level 2. Immediately Mario gets into the underworld, the Underworld motif is played. The 
gamers only interpreted an excerpt of the Underworld motif, which plays from 400 to 384 
game time for the first time. The excerpt contains four consecutive iterations of the same 
chord gestures. Figure 4 presents the first two of the four iterations and their verbal inter- 
pretation in Yoruba. The first and second iterations have separate verbal interpretations. 
The third and fourth iterations have the same verbal interpretations as the first and second 
iterations respectively. Other repetitions of the excerpt are heard throughout the game level 
and each repetition has the same verbal interpretations as the initial occurrence. 


Piano 


Ni bi. lo ma kt sf bé ya lo ma dé Ié Ni bi lo ma ku si bé yaloma délé 


"you are going to die here" "I don't think you will get home" "you are going to die here" "I don't think you will get home" 


Figure 4: The interpretation of Underworld motif by Yoruba gamers 


“Tyan is the Yoriibé name for pounded yam; éba is a staple food made from dried grated cassava (manioc). 


If Mario dies at any stage of the game, regardless of the cause, the Death motif is 
played. The gamers also linguistically interpret this music motif and the linguistic inter- 
pretation is presented in Figure 5. 


Piano 


E ré_ ke ré lo wa se 


Figure 5: The interpretation of Death motif by Yoruba gamers 


In general, Yoruba listeners (i.e., Super Mario gamers) are able to linguistically trans- 
late or interpret certain portions of some music motif in Super Mario Bros. The interpre- 
tations are usually performed as song. Every Yoruba gamer who has played Super Mario 
Bros knows these conventionalised interpretations, but it is uncertain whether someone 
who has not heard the music motif before might have the same interpretations. 

The verbal interpretation of music motif is similar to Yoruba speech surrogates, con- 
sidering that a lexical syllable is mapped to each music note of the music motif. However, 
the choice of tone on a syllable is unpredictable because a specific tone might be associ- 
ated with two different music notes. Studies on Yortba speech surrogates show that the 
pitch contours of the lexical tones with their phonetic realisations are encoded with Yoruba 
drums (Villepastour, 2010; Akinbo, 2019). In §5, I discuss how the the musical contours 
are mapped to the tone on each syllable of the verbal interpretations. Before focusing on the 
tone of the syllables, the next section addresses why the gamers considered the background 
music of the game to be communicative. 


4. Strategies for Mapping Meaning to Music 


The discussion so far has shown that the gamers linguistically interpreted the music motif 
of Super Mario by mapping lexical syllables to the music notes. Instead of mapping lex- 
ical syllables to the music melodies of the music motif in Super Mario Bros, the Yoruba 
gamers could have mapped non-lexical vocables or nonsense syllables to the music motifs, 
as is done in other cultures (see Hughes, 2000; Mullins, 2014; Weir, 2015). So, what deter- 
mines the segmental and tonal properties of each syllable? Why did the gamers use lexical 
syllables instead of vocables? 

The account in this work is that the gamers utilised the situational contexts in the Su- 
per Mario game to determine the segments or words which are mapped to the music motif 
of the game. For example, the phrase “(s)he/it entered and cannot definitely come out” 
is mapped to the Flagpole fanfare motif which plays immediately Mario is about to enter 
the castle, as shown in (3a). If we assume that the linguistic interpretation of the Flagpole 


motif involves mapping the sequence of tones HM H LLM H M to the music notes of the 
Flagpole motif, any phrase with the same sequences of tones, such as the examples in (3b- 
d), could have been mapped to the music. Mapping the phrase “‘(s)he/it entered and cannot 
definitely come out” to the Flagpole music motif suggests that the linguistic interpretation 
of the Flagpole motif is conditioned by the context of Mario “entering the castle and not 
coming out”. 


(3) Possible interpretations of the Flagpole motif 
Tone: HMHLLMHM 


a. O wo’lé kodeé le jade ‘he/she/it entered, cannot come out’ 

b. Ota mi tate lo w’agbo ‘he/she/it stung me, quickly look for herbs’ 
c. Déwolé agbado dara * Déwolé, corn is good’ 

d. te; 


Similar to the interpretation of the Flagpole motif, the situational context in the game 
is also utilised in the interpretation of the Death motif. For instance, the Death motif is 
played when Mario dies. Mapping the phrase “it is a dangerous game that you came to 
play” to the music notes of the Death motif correlates with mocking the death of Mario 
for playing a dangerous game, as shown in (4a). The motif of danger is also echoed in 
the linguistic interpretations of the Overworld motif, as shown in (4b), and the Underworld 
motif, as shown in (4c). 


(4) The interpretations of other Super Mario motifs 

a. Death motif 

Tone: MHHHMHM 

erékéré lo wa se ‘it is a dangerous game that you came to play’ 
b. Overworld motif 

Tone: HMHLLMHM 

teba layé ko wa gbobé I’6run ‘make éba on earth, come to heaven for soup’ 
c. Underworld motif 

Tone: HHMMHH 

nibi lo ma ku sf ‘you are going to die here...’ 

boya lo ma dé ’lé ‘(D don’t think you will get home’ 


As mentioned earlier, Overworld is the main motif of the game, and it is the motif 
which is heard at the beginning of level one. The interpretation of the Overworld motif, 
which is “make éba on earth and come to heaven for soup” (4b), is an adaptation of a Yoruba 
idiom that is used as a death threat. Considering that the objective of the game is for Mario 
to save the Princess or die trying, the gamers possibly interpreted the Overworld motif as a 
warning or threat which is meant to create tension for the player-controlled character (1.e. 
Mario or Luigi) or more specifically the player. If Mario survives the first level of the game, 
the Underworld motif plays immediately at the beginning of the second level which is set in 
the Underworld. Interpreting the first iteration of the Underworld motif as “you are going 


to die here” is a threat to the player who escaped death at the previous stage of the game. 

The assumption that the interpretations involve mapping a specific tone sequence to 
the motifs can be supported if we consider that the two iterations of the Underworld motif 
have the same sequences of tones but different words in their interpretations, as shown in 
(4c). While the interpretation of the first iteration threatens that Mario is going to “die” at 
the Underworld level of the game, the interpretation of the second iteration casts doubts 
about the possibility of Mario repeating a previous event in the game, which is getting 
home (i.e., the castle). In this case, getting home is a metaphor for winning. Generally 
speaking, the interpretation of the first iteration refers to a possible future event in the 
game and the interpretation of the second iteration connects a previous event (i.e., getting 
to the castle) with a possible future event. The fact that both interpretations have the same 
sequences of tones but different words suggests that the gamers are committed to a specific 
tone sequence in their interpretation and that the same tone sequence can be mapped to 
two or more phrases, inasmuch as the meaning of each phrase matches the context of 
the music performance. The tone-sequence requirement is plausibly the motivation for 
replacing the MH word [ija] with the LL word [&ba] in adapting the popular saying for the 
verbal interpretation of the Overworld theme (see §3). 

As mentioned at the beginning of this section, the gamers could have mapped voca- 
bles instead of lexical syllables to the music motifs. Did the Yoruba gamers perceive the 
music motifs as actual words or phrases? My account is that the Yoruba gamers perceive 
the presence of videogame music as the voice of supporters or opponents. For the gamers, 
the themes of the Overworld and Underworld interpretations are death threats to the player- 
controlled character or more specifically the player. While the Death motif mocks the death 
of the player-controlled character, the Flagpole motif is a celebratory motif for the victory 
of the player. This is interesting, because, with a Western ear, one would rather interpret 
the music to express Mario’s mood (e.g., Overworld motif sounds happy, bright and opti- 
mistic) or the atmosphere of the game scenes (as in Whalen, 2004; Laroche, 2012; Schart- 
mann, 2015). But clearly, the Yoruba gamers hear a different “voice” in the music. This 
could be described using the theory of musical persona, i.e., a way to understand music by 
inferring there is a person (or group of persons) speaking through the music (Auslander, 
2006; Cochrane, 2010; Fairchild and Marshall, 2019). 

The fact that the gamers perceive a voice in the videogame motifs can be considered 
an effect of their cultural background, given that the interpretation of the Overworld motif 
is an adaptation of a traditional Yoruba proverb. Another argument for the effect of cultural 
background is that the linguistic interpretations of the videogame motifs are thematically 
comparable to the background music in traditional Yoruba games and entertainment. Like 
the interpretation of the videogame motifs, music in Yoruba culture and other African cul- 
tures is functional, contains textual components, and plays an important role in traditional 
games and events (Adedeji, 1972; Apter, 1998; Agawu, 2001; Green, 2005; Omojola, 2011; 
Campbell, 2015; Agawu, 2016). For instance, in Yoruba societies, instrumental and vocal 
music often feature in wrestling matches such as those of Ordyéyé festival in Ayédé, Ekiti 
state, Nigeria. The background music is played by the supporters from the opposing sides. 
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The background music either (5a) threatens, (5b) warns or (5c) mocks the wrestlers. The 
general goal of the music is to deter the opponent from being victorious. As Apter (1998) 
rightly puts it, the wrestlers fight with their bodies, but their supporters fight with words. 
Similar behaviours are found in sport banters of non-African events and games (see Lee, 
1985; McLeod, 2006; Vale and Fernandes, 2018). 


(5) Songs from Oroyeéyé festival Wrestling matches (source Apter, 1998) 


a Threat biri gbe. ‘be carried by the wind’ 
oyi gbe... ‘be dizzy’ 

b. Warning omodé é gun’gi Ogéde, ‘a child that climbs the banana tree,’ 
a yO baara, ase wi... ‘will slide down’ 

b. Mockery Adé ré di, ‘Adé has thumped down,’ 
bi oni koyin ‘like a bunch of palm kernels’ 


Instruments such as the “talking drum” also play similar roles as vocal music at tra- 
ditional Yoruba events. For example, during masquerade festivals, Yoruba drummers often 
play provocative phrases on the talking drum in order to excite the masquerades or the 
followers. When the phrases in (6) are played on the drum (or spoken by the spectators), 
the followers or the masquerades intensify their dance or any other action. In this sense, 
vocal and instrumental music at traditional Yoruba events are discourse (Agawu, 2001; 
Villepastour, 2010, 2014; Agawu, 2016). 


(6) Yoruba drum phrases (Famiuile, 2018) 


a oolese bi baba re fi se “you are not as competent as your father’ 
b. b’6ba se pé’mi ni’wo ni_ ‘If I were you’ 
n ba fapa jo, fapa jo... ‘T will dance with my hands unceasingly’ 


Based on the thematic similarities between the verbal interpretations and the music 
at traditional Yoruba events, we can say that the gamers consumed the videogame mu- 
sic using their background knowledge about the functions of background music in Yoruba 
recreational activities with a competitive component. In other words, the interpretive move 
of the gamers can be considered an effect of social expectation, which is “an internalised so- 
cial norm for individuals and organisations...about what people should do” (Hasegawa et al. 
2007: 180). That the Yortba gamers utilised their cultural knowledge about game music 
in their interpretation of the videogame music is in line with the enculturation account of 
musical interpretation, which suggests that musical meaning is exclusively determined by 
cultural convention and social background (Keil and Feld, 1994; Walker, 1996; Gregory 
and Varney, 1996). For example, Feld (1984) suggests that a listener might relate a musical 
object or event to personal and social conditions, and related experiences where a similar 
sound object can be heard. Gregory and Varney (1996) argue that “the interpretation of 
music is determined more by cultural tradition than by the inherent qualities of the music”. 
Social expectations have also been shown to facilitate comprehension and evaluation of 
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spoken language (Rubin, 1992; Devos and Banaji, 2005; Kang and Rubin, 2009; Yi et al., 
2013; Babel and Russell, 2015; McGowan, 2015). 

Another factor that could be at play is that music in videogames is comparable to leit- 
motif or film music, i.e., they both involve associating musical phrases to objects, events, 
people and storytelling in an audio-visual medium (Wagner, 1964; Whalen, 2004; Mun- 
day, 2007), and that audiences from diverse cultures have learned to understand film music 
as being related to the scene, events, and emotions shown in the movie (Thackway, 2002; 
Cohen, 2011). Notably, Koji Kondo and other composers of videogame music liken the 
function of the videogame music to leitmotifs in films (Laroche 2012:29). For the interpre- 
tation of the Super Mario motifs, it is plausible that the Yoruba gamers also extended their 
understanding of film music to videogame music. 

In addition to contextual clues and social expectation, the physical properties of mu- 
sic such as pitch and tempo have also been shown to prime the interpretation of music. 
For example, faster tempo is associated with joy and happiness for Western adult listeners 
(Scherer and Oshinsky, 1977; Kellaris and Kent, 1993; Gagnon and Peretz, 2003; Webster 
and Weir, 2005; Eerola et al., 2013). Similar results are found for children music listen- 
ers (Dalla Bella et al., 2001; Mote, 2011). However, these studies are based on music 
traditions without speech surrogates. Considering that Yoruba has a music tradition with 
a speech surrogate system which mostly relies on mapping the pitch contours of speech 
tones to the pitch contours of the tunes (Villepastour, 2010, 2014; Akinbo, 2019), it is im- 
portant to inquire whether pitch contours of music melodies influence the interpretation of 
music. Specifically, does the music interpretation involve vocal imitation of non-linguistic 
sounds? For this purpose, we must compare the pitch contours of the music notes in Super 
Mario to those of their conventionalised linguistic interpretation in Yoruba. This investi- 
gation is crucial to the tone sequences of the phrases that are mapped to the videogame 
motifs. In order to compare the pitch contours of the music motif and their corresponding 
linguistic interpretations, I conducted an acoustic investigation on the pitch contours of the 
conventionalised interpretations and the music motif. 

Before turning to the acoustic investigation, the summary is that, to Yoruba gamers, 
the videogame music is communicative as a result of drawing a parallel between the roles of 
background music in traditional Yoruba games and the videogame. In this case, the cultural 
background and experience of the gamers are motivations for the linguistic interpretation 
of the instrumental music. I have also shown that the segmental and morphemic properties 
of the linguistic interpretation are a by-product of assigning meaning to music through 
contextual clues and game events. 


5. Phonetic Aspects of Mapping Meaning to Music 


5.1 Methodology 


I detail my acoustic investigation on the conventionalised interpretation in this section. For 
this study, ten native speakers of Yoruba were recruited in Vancouver, British Columbia. 
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The participants are 4 females and 6 males who immigrated from Nigeria and have spent 
at most seven years in Canada. They were all between the ages of 26 and 45. The stim- 
uli in this study are the relevant Super Mario motifs and their conventionalised linguistic 
interpretations in Yoruba. Because the sounds in Super Mario Bros often overlap, the rel- 
evant Super Mario music motif were extracted from YouTube videos of the instrumental 
music (sources Youcanplayit, 2012a,b,c; Luuul’s, 2016). The motifs were performed on a 
piano by the YouTube sources, unlike the original game music which were performed on a 
synthesizer. 

The linguistic interpretations with and without the corresponding instrumental music 
of Super Mario were presented to each of the participants. When the participants were 
presented the linguistic interpretations without the instrumental music, they were instructed 
to read the linguistic interpretations in speech mode. When the linguistic interpretations 
were presented with the instrumental music, the participants were instructed to recited the 
phrases in the rhythm of the original melody. I refer to this recitation as song through out 
this work. For each participant, each stimulus was repeated three times in speech mode 
as well as in song mode. The data were recorded with a SHURE WH30XLR cardioid 
condenser (a headset microphone) at the sampling rate of 48.1 kHz in .wav format. The 
microphone was attached to a zoomQ8 camera. 


0.03308! 


-0.03638| 


Figure 6: Waveforms and spectrograms for underworld motif in speech mode. Blue vertical 
lines show annotation boundaries 


The notes of the relevant Super-Mario music were manually annotated in Praat (Boersma, 
2001) and analogously for the corresponding tones of the verbal imitation (1.e., T1, T2, 
etc.). To compare the pitch contours of the music and the tones of the corresponding ver- 
bal interpretations, FO values of the pitch contour were extracted at 25%, 50% and 75% 
intervals for each music note (and analogously for each tone). 

To calculate the correspondence between pitch trajectories of Mario, speech and song 
sources, I used Pearson correlation coefficient R which measures the strength and direction 
of a linear relationship between two variables. The value of R is always between +1 and 
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1. The closer the value of R is to +1, the stronger the positive relationship between the two 
variables. However, the closer the value of R is to 1, the stronger the negative relationship 
between the two variables. If the value of R is 0, it means there is no relationship between 
the two variables (see Rumsey 2009, for a basic description this statistical measurement). 
The null hypothesis is that there is no difference between speech and poetic modes for all 
the acoustic parameters. If the p-value is <0.05, there is a statistically significant effect of 
speech or poetic modes for the acoustic parameters. Therefore, there is a strong evidence 
against the null hypothesis. A p-value of > 0.05 indicates weak evidence against null hy- 
pothesis. The Pearson’s correlation coefficient were calculated using ggpubr (Kassambara, 
2018). In the next subsection, I present the results of the production experiment. 


5.2 Pitch of Super Mario music and the corresponding Yoruba interpretations 


The FO contours of the Super Mario music and their corresponding linguistic interpretation 
are presented in this section. As mentioned earlier, the relevant Underworld motif has 
two iterations with different interpretations. For ease of identification, the first and second 
iterations of the the motif are labelled Underworld1 and Underworld? respectively. The line 
plots of Underworld1, Underworld2 and the other music motifs are presented in Figure 
7 to 11. For each figure, the y-axis contains the FO values, and the x-axis contains the 
normalised time interval for sequences of tone and music-note. The FO values are grouped 
based on the source, namely Mario, speech and song. The results from each participant are 
grouped by panels (i.e. pl, p3, p#). 
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Figure 7: Pitch contours of Underworld! in Mario, speech and song 


For Underworld1, Underworld1, Overworld and Death motifs, the pitch contours of 
the Super Mario motifs and those of their verbal interpretation in song and speech modes 
are similar at all time intervals. However, in the verbal interpretation of the Overworld 
motif by participant 11, the pitch contour of the song source does not match that of the 
corresponding Mario source. The pitch contours of the spoken and sung interpretations 
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are higher than those of the Mario sources for all the participants. When compared to the 
speech interpretation, the sung interpretation has higher pitch. This is expected considering 
that singing requires more vocal demand than speech. 
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Figure 8: Pitch contours of Underworld2 in Mario, speech and song 


The pitch contours of the Mario sources in Figure 7 and 8 seems flat when compared 
to their spoken and song interpretations. There are two main factors that contribute to 
the distinction. First, the pitch distinction between the notes of the Underworld motif are 
smaller when compared to the pitch distinction between the tones of the corresponding 
spoken and sung interpretations. Second, the target of sound-meaning mapping in speech 
surrogacy system is pitch contours not pitch height, as shown in Figure 1. It is worth 
mentioning that the height of pitch contours in the spoken and song interpretation of the 
music motif can be conditioned by various linguistic factors. For instance, there is “[an] 
overall available pitch range for...[every] speaker...and where in the pitch range...[each] 
tone should be produced” (Yip 2002:11). The pitch range of a tone can also be conditioned 
by vowel types, the onset of a syllable or the position of the tone-bearing unit (Hombert, 
1977; Whalen et al., 1999). For example, the pitch value of a tone is higher when the tone- 
bearing unit is a high vowel and lower when the tone-bearing unit a low vowel (Whalen 
and Levitt, 1995). 

Unlike the results of the other music motif, the pitch contour of Flagpole motif and 
that of the spoken and sung interpretation are only similar at the middle and the end of the 
tone sequences for most participants. For example, at the beginning of the tone sequence, 
the pitch contour of the verbal interpretation rises instead of falling like the Mario source. 
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Figure 9: Pitch contours of Overworld in Mario, speech and song 
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Figure 10: Pitch contours of Death motif in Mario, speech and song 


With the line plots alone, it is difficult to quantify the strength of the relationship 
between the FO contours of Mario motifs and the corresponding linguistic interpretation. 
To understand the strength of the relationship between the pitch contour of the videogame 
music and the corresponding spoken and sung interpretations, I turn to the results of the 
Pearson’s correlation coefficient. The summary of all the results are presented in Table 1. 
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Figure 11: Pitch contours of Flagpole motif in Mario, speech and song 


As shown in Table 1, there is a positive linear relationship between the pitch con- 
tours of M(ario) instrumentals and their linguistic interpretations in A(cappella) or S(peech) 
mode, but the strength of relationship varies based on Mario instrumentals and the verbal 
interpretation modes. Regardless of the modes of verbal interpretations, the relationship 
between Mario motifs and their corresponding linguistic interpretations are stronger for 
Underworld and Overworld motifs (R>0.58) than for Death and Flagpole motifs. Relative 
to the pitch contours of the speech renditions, the pitch contours of the song renditions are 
closely related to the pitch contours of the Mario instrumentals. With the exception of the 
Death and Flagpole tunes, the relationships between the pitch contours of all the music 
motifs and their spoken and sung interpretations are statistically significant. 


Table 1: Correlation coefficients of Mario motifs and the spoken/sung interpretations 


SPEECH vs SONG | MARIO vs SPEECH | MARIO vs SONG 

Underworld1 R | 0.98 0.94 0.96 
_p| <0001  {<0.001 | <0.001_— 

Underworld2 0.25 0.92 0.93 
p | <0001 | <0001 | <0.001 

Overworld R | 0.93 0.58 0.75 
p{<0001 [0001 0.001 

Death R | 0.78 0.31 0.7 
p | <0001 [0172 <0.001 

Flagpole aR 0.2 0.13 
p|<000r [034 0537 


The results also show that there is a positive relationship between the pitch contours 
of the interpretation in speech and song modes, but the degree of similarity varies depend- 
ing on the Mario instrumentals which is interpreted. If we take into account that vowel 
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and consonant types affect the pitch value of a tone (Hombert, 1977; Whalen and Levitt, 
1995; Whalen et al., 1999), the varying degree of positive correlations between the Mario 
motifs and their corresponding spoken and sung interpretations can be considered an effect 
of segmental features on the pitch value of a tone sequences. The effects of segments on 
the degree of similarity become apparent when we compare the results of Underworld! and 
Underworld2. The Underworld] and Underworld2 having slightly different correlation co- 
efficients is plausibly the effects of the segments properties of their corresponding sung and 
spoken interpretations, considering that the Underworld1 and Underworld2 have the same 
sequence of music notes and the same tone sequences in their sung and spoken interpre- 
tations. Put differently, the segmental distinction between Underworld! and Underworld2 
must be the factor that contributes to the varying degree of similarity between the music 
motifs and their corresponding spoken and sung interpretations. 

The results of the verbal interpretations are similar to the Yoruba talking drum. As 
reported in Akinbo (2019), Yoruba drummers communicate by distinctively imitating the 
pitch contours of tones with their native drums and by representing each syllable with a 
drum strike. The results of the study show that there is a strong positive relation between 
the pitch contours of Yoruba tones and those of their corresponding drum renditions. The 
representation of Yorba tones and syllables with a talking drum is consistent sound imi- 
tation: “a process by which an individual either vocally or non-vocally generates sounds 
with qualities that reproduce elements of previously experienced sounds” (Mercado III 
et al. 2014: 39). 

Despite the similarities between the linguistic interpretation of the Super Mario mu- 
sic and the talking drum rendition of Yoruba words, there are certain differences. Tone- 
based speech surrogates do not encode segmental features (McPherson, 2019), but the vo- 
cal imitation of the Super Mario music contains segmental features. When we compare the 
phonetic properties of the linguistic interpretation to those of the Super Mario Mario music 
motif, we observe that the music notes of the Super Mario music motif do not contain infor- 
mation about segmental properties (see Jackendoff and Lerdahl, 2006; Jackendoff, 2009). 
So, this begs the question: what determines the segment or morpheme that is matched to 
the musical patterns of the Super Mario music motif? In the next section, we focus on this 
issue. 

Another interesting aspect of the results involves the correlation between the pitch 
trajectories of the interpretations in speech and song modes. Although there is a positive 
correlation between the speech and song of the musical interpretations, the strength of the 
relationship varies depending on the Mario instrumentals which is musically interpreted. 
In fact, the correlation between the pitch contour of the speech and song for the Death 
tune instrumental is not statistically significant. This is in line with the hypothesis that the 
pitch trajectories of music melody in a tone language is not determined by language, but 
music can accommodate language when it is musically feasible (Schellenberg, 2012, 2013; 
McPherson and Ryan, 2018). 

The discussion in this section is summarised as follows. Yoruba gamers linguistically 
interpreted the Super Mario music motif by mapping the pitch contours of the music notes 
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to Yoruba phrases with similar pitch contours. Specifically, the gamers achieved this pitch- 
contour matching by modifying a suitable Yoruba idiom or generating strings of words with 
similar pitch contours as the instrumentals. In this case, the tone sequences of the linguistic 
interpretations are determined by imitating the pitch contour of the videogame music. 


6. Formal analysis and implications of musical meaning in Yoruba 


The goal of this section is to present a formal account of how Yoruba gamers interpreted 
the Super Mario music motif. For formal accounts of musical meaning, Patel (2008) iden- 
tifies two approaches, namely semantic and pragmatic. In the semantic approach, the idea 
is that instrumental music can prime representations of meaningful concepts (Hacohen and 
Wagner, 1997; Koelsch et al., 2004; Steinbeis and Koelsch, 2011). Under the pragmatic ap- 
proach, musical meaning is derived through contextual information and multimodal chan- 
nels (Feld, 1984; Agawu, 2001; Patel, 2008; Garcia et al., 2014; Neumeyer, 2015). To 
account for the linguistic interpretation of the Super Mario motif, I adopt the music in- 
tepretative moves of Feld (1984), which can be considered a pragmatic approach. 

In his work on music communication, Feld (1984) argues that musical structures ex- 
ist in a social construct, and they have meaning through social interpretations. In this 
approach, musical meaning is derived from both the internal structure of musical discourse 
and the situational contexts relating to musical performance or consumption. Feld (1984) 
proposes a music communication model which is schematically represented in Figure 12, 
as the interpretive move for musical meaning. In the model, music communication pro- 
cess involves two dynamically linked components, namely dialectics of sound objects and 
interpretive moves. The interpretive moves contain five elements: (1) a locational element 
which relates a sound object to an appropriate range within a subjective field of like or 
unlike items/events; (ii) a categorical move which relates sound to things (e.g anthems and 
patriotic songs); (iii) an associational element which relates the sound event to a visual, 
musical or verbal imagery; (iv) a reflective element which relates a sound object to per- 
sonal and social conditions, and experiences where things like and unlike the object can be 
heard; (v) an evaluative move which involves an affective meaning (Feld 1984: 8). While 
all or some of the five elements might be utilised in the interpretative moves, the element 
utilised in an interpretative move must interact with the identity of the listener, an expressive 
ideology and world sense coherence. 
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Figure 12: Interpretive moves (Feld 1984: 9) 
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The interpretive moves proposed in Feld (1984) can account for the interpretation of 
the Super Mario motifs. For this purpose, I refer to the discussion in the previous sec- 
tion. The discussion suggests that the Yoruba gamers used their cultural experience, vocal 
imitation and situational contexts of the game in their interpretation of the Super Mario 
instrumentals. Drawing on their native cultural experience in the interpretation of the Su- 
per Mario music motif can be considered a reflective interpretive move. That the gamers 
imitated the pitch contours of the instrumental music can be accounted for with a locational 
interpretive move, which involves relating a sound object to an appropriate range within a 
subjective field of like or unlike items. In this case, the music motif of the game is possi- 
bly likened to the talking-drum music in Yoruba. Given that the associational interpretive 
move involves relating sound event to a visual, musical or verbal imagery, the use of the 
visual events in the game for the interpretation of music is consistent with an effect of the 
associational element. 

I have classified the formal account of musical meaning in Feld (1984) as a prag- 
matic approach, but pragmatic approach to musical meaning, as conceptualised by Patel 
(2008), solely focuses on the internal structure of musical phrase. Patel (2008) argues for 
a pragmatic approach as a more efficient way to studying musical meaning. Central to the 
pragmatic approach in Patel (2008) is the coherent structure of discourse. For instance, the 
sentences or clauses in (7a) are coherent because they are related in systematic ways and 
form a unified meaningful whole. In line with the maxim of relevance (also relation) in 
Grice’s Cooperative principle (Grice, 1975; Leech, 1983), the second utterance is relevant 
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because it contributes to the communicative goal. 


(7) Linguistic discourse (from Hobbs, 1979) 
a. Coherent [John can open Bill’s safe]. [He knows the combination]. 
b. Incoherent [John took a train from Paris to Istanbul].[ He likes spinach]. 


By assuming that coherence relations between clauses in linguistic discourse are analogous 
to coherence relations between phrases and motifs in musical discourse, Patel (2008) sug- 
gests that investigating coherence in musical discourse could offer more valuable insights 
on musical meaning. The pragmatic approach, as proposed by Patel (2008), predicts that 
the internal structure of a musical piece, such as the number of musical notes, should have 
an effect on musical meaning. An argument in support of the pragmatic approach in Patel 
(2008) is an experiment on the cognition of raj nplaim music, which is a musical speech 
surrogate of Hmong people (Poss, 2012). The results of the experiment shows that the par- 
ticipants were able to match Hmong words to the melodies and rhythm of the raj nplaim 
instrument, but they found it easier to match phrases to longer musical melodies. That the 
participant performed better on longer musical phrases could be an effect of coherence. 
The prediction of the pragmatic account in Patel (2008) is not compatible with the interpre- 
tation of Mario instrumentals by Yoruba gamers, considering that they used the context of 
musical performance in addition to imitating the pitch sequences of the instrumental music. 

I now compare the account in this work to the semantic approach which holds that 
musical features, such as pitch, tempo and rhythm, are capable of priming affective and 
non-affective meanings (Scherer and Oshinsky, 1977; Kellaris and Kent, 1993; Gagnon 
and Peretz, 2003; Webster and Weir, 2005; Steinbeis and Koelsch, 2011; Eerola et al., 
2013; Dalla Bella et al., 2001; Mote, 2011). While studies adopting semantic approach to 
musical meaning ignores the role of situational context in musical meaning, the evidence 
for musical meaning from such studies come from cognitive experiments that involve pair- 
ing music with external stimuli, such as pictures and words (e.g., Koelsch et al. 2004: 
302). 

There is evidence to suggest that music can prime semantic concepts, but as Patel 
(2008: 334) notes, it does not mean “that music has a semantic system on par with lan- 
guage”. For instance, unlike human languages where vocal or visual signals without any 
external stimuli can have a semantic meaning, an instrumental music without a matching 
word or any other stimuli has not been shown to elicit a semantic processing (see Hacohen 
and Wagner, 1997; Steinbeis and Koelsch, 2011; Koelsch, 2005). Of course, the predic- 
tion of the semantic approach might not hold in a music tradition with musical speech 
surrogate like Yoruba. For instance, a specific pitch value or tone might be associated 
with words that refer to opposing concepts such as happiness and sadness. If music really 
elicits a semantic processing, the music notes in the Underworld motif of Mario and its 
repetition would have been assigned the same meaning. Considering that Yoruba gamers 
utilised their cultural background and visual information of the game in their interpretation 
of the background music, it is highly possible that the meaning of the instrumental music in 
music-word pairing experiments (e.g., Koelsch et al., 2004; Steinbeis and Koelsch, 2011) 
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is derived from coupling music with the visual signals, not music alone. Following from 
this, the semantic account of musical meaning is not compatible with the interpretation of 
Mario instrumentals by Yoruba gamers. 

The discussion in this section has shown that the mapping of linguistic meaning to 
musical sound is better captured by a model that considers internal structure of music 
phrase, contexts of music performance and cultural background of music consumers. 


7. Discussion, summary and conclusion 


This study has investigated the linguistic interpretation of Super Mario musical motifs by 
Yoruba gamers. The results of the study suggest that the instrumentals were interpreted 
through multimodal channels or moves which might be unordered. The first step involves 
one-to-one mapping between syllables and music notes of the instrumentals. To determine 
the tone of the syllables, the second move involves mapping the pitch contours of the mu- 
sic motif to Yoruba phrases with similar pitch contours. However, the degree of similarity 
between the pitch contours of the music motifs and their linguistic interpretations varies 
depending on the segments of the sung and spoken interpretations. In the third move, the 
choice of segments or morphemes which is mapped to the music is determined by the sit- 
uational contexts of the musical performance. In this case, situational contexts include the 
visual events surrounding the instrumental music in the game, non-virtual events compara- 
ble to the visual events in the game and the cultural background of the listeners. 

As reported in studies on Yoruba talking drums, the communicative capability of the 
speech surrogates is based on mapping the pitch contours of speech forms to the pitch se- 
quences of musical form (Villepastour, 2010, 2014; Akinbo, 2019, 2021b; Gonzalez and 
Oludare, 2022). That the gamers vocally imitated the pitch contours of music notes sug- 
gests a similarity between musical speech surrogates and linguistic interpretation of music. 
The strategies used in talking-drum communication and the interpretation of Super Mario 
music strongly suggests that the mapping of music to linguistic meaning and vice versa 
involve sound imitation. 

The results of this study suggest some parallels between sound-meaning mapping in 
music and language. The first parallel between mapping vocal signals and music forms to 
meaning is imitation. For instance, it is well established that the meaning of onomatopoeic 
words is based on perceptual resemblance between referent and linguistic form (Assaneo 
et al., 2011; Tsur, 2001; Bezat et al., 2014). While imitation has been recognised as one 
of the parallels between language and music (Patel, 2008; Jackendoff, 2009), the role of 
imitation in musical meaning has mostly gone unnoticed. This is probably because studies 
on musical meaning mostly focus on data from music traditions without speech surrogates 
and from experimental conditions, not natural contexts of music consumption. Even when 
research focuses on speech surrogates, the role of imitation in decoding the messages of 
speech surrogates is only alluded to (Armstrong, 1954; Villepastour, 2010, 2014; Agawu, 
2016). This work presents natural evidence for imitation as an strategy for musical meaning 
in a culture with speech surrogate. 
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Another parallel between sound-meaning mapping in language and music involves 
the role of contextual clues. Studies suggest that contextual clues does not only support 
language acquisition but enhances the retrieval of previous knowledge about sound objects 
in sound-meaning mapping (Fraser, 1999; Cetinavci, 2014; Nation, 2015; van den Broek 
et al., 2018). Similar to sound-meaning association in spoken language, the results of the 
present study indicate that the interpretation of the music motif is context-dependent. That 
said, there is reason to believe that musical meaning in traditions with speech surrogates 
may also be context free if we consider that communicative capability of speech surrogates 
mostly lies in their ability to imitate linguistic features (Akinbo, 2019, 2021b; McPherson, 
2019). 

The verbal imitation of instrumental music is comparable to warblish, which is a 
“non-onomatopoeic verbal mimicry of avian vocalisation” (Sarvasy, 2016, p. 766). Just as 
the verbal imitation of Super Mario music by Yoruba gamers, context and imitation play a 
role in English warblish, but the verbal imitation in English are mostly incoherent. Future 
research should investigate whether incoherence stems from the fact that speech surrogacy 
is prominent in Yoruba music tradition but not in English. 

The fact that the gamers only interpreted a portion of the game music is consistent 
with the drum performance and perception. For example, the drum performance in tradi- 
tional Yoruba music involves switching between purely musical rhythm and speech-like 
rhythm used in the context of surrogacy (Euba, 1990; Villepastour, 2010; Agawu, 2016). 
To a large extent, a Yoruba speaker is aware when a drummer a switches from purely mu- 
sical rhythm to speech-like rhythm and vice versa. This is established in the perceptual 
experiment of Durojaye et al. (2021a). The logical move for future reseaarch is to compare 
the rhythm of Yoruba speech and song to the rhythm of the (un)interpreted portions of the 
videogame music (as in Patel and Daniele, 2003). 

Given that the linguistic interpretations reported here developed naturally and con- 
ventionalised with many iterations of generational overturns, there are certain limitations 
of the present study. For instance, we are uncertain whether, in addition to the factors men- 
tioned in this work, other factors play a role in the linguistic interpretation of the Super 
Mario music. It would be interesting to investigate whether there is a meaningful distinc- 
tion between interpreting music on the fly and a conventionalised pattern as is the case here. 
In a controlled environment, an important issue to address in future studies is how the con- 
text of music performance contributes to the linguistic interpretation of instrumental music. 
For such studies, embedding the musical stimuli as the background music of a specially de- 
signed videogame could be promising. Such studies will not only contribute to the fields of 
linguistics, psychology and music but the emerging field of videogame music, where there 
is a consensus that background music of a videogame increases the level of “cognitive 
immersion” or “being in the game” (e.g. Zehnder and Lipscomb, 2006; Grimshaw, 2008; 
Nacke et al., 2010; Sanders and Cairns, 2010; Fu, 2015). The use of cultural experience 
and situational contexts in the interpretive moves by the Yoruba gamers is possibly because 
music in African tradition is strongly functional, linked to dance and externally motivated 
by social and musical contexts (Agawu, 2006, 2001). Future research should investigate 
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whether functional and non-functional music are interpreted differently. 

It is unknown how these interpretations developed. Apart from the fact that the inter- 
pretations developed in South-West Nigeria, no one knows the exact time and place where 
it started. The significant of these interpretations for the gamers has also not being studied. 
While all these are relevant issues, we might only be able to address a few of these issues 
in future research. 

To conclude, Yoruba gamers linguistically interpreted instrumental music by vocally 
imitating the acoustic features of the music motifs. By considering the immediate contexts 
of musical performance and the social expectation about music in the relevant contexts, the 
music consumers assign lexical meaning to the music motifs. Given the immersive power 
of videogame, I strongly recommend videogame as an experimental tool for investigating 
the role of context in music and language studies. 
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